Skip to content

Add budget-window batch economy endpoint#3387

Draft
MaxGhenis wants to merge 14 commits intomasterfrom
codex/budget-window-batch
Draft

Add budget-window batch economy endpoint#3387
MaxGhenis wants to merge 14 commits intomasterfrom
codex/budget-window-batch

Conversation

@MaxGhenis
Copy link
Copy Markdown
Collaborator

Summary

  • add a dedicated budget-window economy endpoint that batches yearly reform-impact work server-side
  • cap active yearly jobs and return aggregate progress, completed years, queued years, and final totals
  • cover completed, queued, and error flows in the economy service tests

Testing

  • FLASK_DEBUG=1 uv run --python 3.12 pytest tests/unit/services/test_economy_service.py
  • uv run --python 3.12 ruff format --check .

Companion app change: PolicyEngine/policyengine-app-v2#930

MaxGhenis and others added 2 commits March 10, 2026 08:44
The axes code path silently discarded all exceptions (`pass`),
causing variables like NJ gross income to return null with no
error trace. Now logs the full traceback via logging.exception().

Fixes #3322

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 74.25000% with 103 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.79%. Comparing base (6af28d7) to head (b82ace8).
⚠️ Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
policyengine_api/services/economy_service.py 73.55% 38 Missing and 35 partials ⚠️
...olicyengine_api/services/reform_impacts_service.py 73.43% 14 Missing and 3 partials ⚠️
policyengine_api/routes/economy_routes.py 85.36% 3 Missing and 3 partials ⚠️
policyengine_api/api.py 40.00% 2 Missing and 1 partial ⚠️
policyengine_api/data/data.py 62.50% 3 Missing ⚠️
policyengine_api/country.py 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3387      +/-   ##
==========================================
+ Coverage   72.20%   72.79%   +0.59%     
==========================================
  Files          56       56              
  Lines        2425     2827     +402     
  Branches      423      514      +91     
==========================================
+ Hits         1751     2058     +307     
- Misses        595      649      +54     
- Partials       79      120      +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@anth-volk
Copy link
Copy Markdown
Collaborator

I think it is dangerous for request-serving code to issue schema-management SQL (SHOW COLUMNS, ALTER TABLE ...) against production. Please move that to a manual migration step before the PR is pushed, or introduce a migration tool such as Alembic so schema changes are applied explicitly rather than opportunistically during live traffic. That will reduce the chance of production-breaking bugs around deploy order, permissions, or lock contention.

Separately, the new caching / coordination flow feels unnecessarily complex for what should be a standard cache. Using DB rows plus advisory locks plus provisional execution IDs introduces a lot of race-prone state management. Please redo this around Redis instead of SQL locking, so we have a conventional cache / in-flight coordination mechanism rather than custom lock orchestration in request code.

Copy link
Copy Markdown
Collaborator

@anth-volk anth-volk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be reworked so the orchestration lives in the simulation API repo rather than in API v1.

Benefits:

  • Much less request-time table editing and coordination logic in API v1. We would not need request-serving code to manage provisional rows, execution ID promotion, stale-claim cleanup, or schema-related DB behavior.
  • Better scalability. The simulation API repo is already the natural boundary for spawning and tracking simulation work, and Modal can scale the worker side horizontally for us.
  • Fewer race-condition risks. The current DB-lock/provisional-claim flow is a lot of custom concurrency machinery for what is essentially a batch execution problem.
  • More leverage from Modal. We already get async job spawning, polling by job ID, worker isolation, and container scaling. We should use that instead of rebuilding orchestration in Flask request code.
  • Easier eventual integration into API v2 alpha. If the batch orchestration lives in the simulation API repo, API v1 and API v2 alpha can both consume the same backend capability instead of duplicating orchestration logic in each API layer.

Implementation outline:

In the simulation API repo:

  • Add a batch endpoint for this workflow, e.g. /simulate/economy/comparison/batch or /simulate/economy/budget-window.
  • The request should accept either:
    • a base simulation payload plus start_year, window_size, and max_parallel, or
    • a fully expanded list of yearly payloads plus max_parallel.
  • On submission, create one parent batch job and return a single batch job ID immediately.
  • The batch job should fan out child yearly simulations as separate Modal executions, up to the requested parallelism limit.
  • Track child job IDs and child statuses under the parent batch job.
  • Expose polling for the parent batch job so the caller can retrieve:
    • overall status
    • progress
    • completed years
    • running years
    • queued years
    • failed years / error message
    • final aggregated annual impacts and totals once complete
  • The aggregation logic for the budget window should also live there, so the backend that owns the fan-out also owns the child-job status and final merged result.

In API v1:

  • Keep the budget-window route and request parsing.
  • Keep the budget-window response contract, since this PR is already moving toward the right shape for the frontend.
  • Replace the current per-year orchestration with a thin adapter:
    • build one batch request
    • submit it to the simulation API repo
    • store/poll one parent batch job ID
    • map the batch status/result into the existing BudgetWindowEconomicImpactResult response
  • Remove the request-time thread pool, DB advisory locks, provisional execution IDs, stale-claim handling, and per-year reform_impact coordination.
  • If caching is still needed in API v1, use Redis as a standard cache:
    • cache completed batch results by a canonical request key
    • optionally cache “batch job currently in progress for this key”
    • do not use SQL locking as the cache coordination mechanism

That would leave API v1 as a thin HTTP adapter and move the concurrency/orchestration concerns into the simulation API repo, where Modal is already doing the real execution work. It should be simpler to reason about, easier to scale, and substantially less race condition-prone than the current lock-based design.

If this direction makes sense, I’d be happy to take it on in follow-up and flag @MaxGhenis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants